-
Notifications
You must be signed in to change notification settings - Fork 179
Add initial specification for Big Endian #470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I think we need to clarify which relocations write data in big-endian order when the output is big-endian. My understanding is as follows: R_RISCV_{32,64} |
Hi @rui314, I agree, thanks a lot for pointing that out. |
332088f to
97ccbce
Compare
| NOTE: Big-endian calling conventions follow the same rules as little-endian | ||
| calling conventions. The only difference is in the byte ordering of multi-byte | ||
| values in memory and registers. Register usage, argument passing, and return | ||
| value conventions remain the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this address #265 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a draft for that: we already pass 2×XLEN bits scalar in what we expected:
calars that are 2×XLEN bits wide are passed in a pair of argument registers,
with the low-order XLEN bits in the lower-numbered register and the high-order
XLEN bits in the higher-numbered register. If no argument registers are
available, the scalar is passed on the stack by value. If exactly one
register is available, the low-order XLEN bits are passed in the register and
the high-order XLEN bits are passed on the stack.
So I tried to add a paragraph to clarify also give an example for that, also adding a NOTE to describe the rationale.
The other thing I added is for Variadic arguments with 2×XLEN-bit and Aggregates with XLEN < size <= XLEN *2.
I didn't check with GCC implementation yet, but IIRC that's may not match GCC's default big-endian behavior.
diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 0768360..037b47f 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -191,6 +191,16 @@ available, the scalar is passed on the stack by value. If exactly one
register is available, the low-order XLEN bits are passed in the register and
the high-order XLEN bits are passed on the stack.
+This register-pair ordering is defined in terms of value significance and is
+independent of endianness. For example, on RV32BE a 64-bit scalar returned
+in a0/a1 places bits [31:0] (the least-significant XLEN bits) in a0 and
+bits [63:32] in a1; memory layout remains big-endian.
+
+NOTE: Defining the register-pair ordering independent of endianness allows
+RV32_Zdinx and Zilsd paired load/store paths to be used directly for argument
+passing and return without extra swaps. Memory layout remains governed by the
+target endianness.
+
Scalars wider than 2×XLEN bits are passed by reference and are replaced in the
argument list with the address.
@@ -198,7 +208,10 @@ Aggregates whose total size is no more than XLEN bits are passed in
a register, with the fields laid out as though they were passed in memory. If
no register is available, the aggregate is passed on the stack.
Aggregates whose total size is no more than 2×XLEN bits are passed in a pair
-of registers; if only one register is available, the first XLEN bits are passed
+of registers with the fields laid out as though they were passed in memory:
+the lower-numbered register holds the lower-addressed XLEN-sized chunk of
+the aggregate and the higher-numbered register holds the next chunk;
+if only one register is available, the first XLEN bits are passed
in a register and the remaining bits are passed on the stack. If no registers are
available, the aggregate is passed on the stack. Bits unused due to
padding, and bits past the end of an aggregate whose size in bits is not
@@ -231,7 +244,10 @@ same manner as named arguments, with one exception. Variadic arguments with
even-numbered), or on the stack by value if none is available. After a
variadic argument has been passed on the stack, all future arguments will also
be passed on the stack (i.e. the last argument register may be left unused due
-to the aligned register pair rule).
+to the aligned register pair rule). For 2×XLEN scalars placed in an aligned
+register pair, the lower-numbered register holds the least-significant XLEN bits
+and the higher-numbered register holds the most-significant XLEN bits,
+regardless of endianness.
Values are returned in the same manner as a first named argument of the same
type would be passed. If such an argument would have been passed by| NOTE: Big-endian calling conventions follow the same rules as little-endian | ||
| calling conventions. The only difference is in the byte ordering of multi-byte | ||
| values in memory and registers. Register usage, argument passing, and return | ||
| value conventions remain the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a draft for that: we already pass 2×XLEN bits scalar in what we expected:
calars that are 2×XLEN bits wide are passed in a pair of argument registers,
with the low-order XLEN bits in the lower-numbered register and the high-order
XLEN bits in the higher-numbered register. If no argument registers are
available, the scalar is passed on the stack by value. If exactly one
register is available, the low-order XLEN bits are passed in the register and
the high-order XLEN bits are passed on the stack.
So I tried to add a paragraph to clarify also give an example for that, also adding a NOTE to describe the rationale.
The other thing I added is for Variadic arguments with 2×XLEN-bit and Aggregates with XLEN < size <= XLEN *2.
I didn't check with GCC implementation yet, but IIRC that's may not match GCC's default big-endian behavior.
diff --git a/riscv-cc.adoc b/riscv-cc.adoc
index 0768360..037b47f 100644
--- a/riscv-cc.adoc
+++ b/riscv-cc.adoc
@@ -191,6 +191,16 @@ available, the scalar is passed on the stack by value. If exactly one
register is available, the low-order XLEN bits are passed in the register and
the high-order XLEN bits are passed on the stack.
+This register-pair ordering is defined in terms of value significance and is
+independent of endianness. For example, on RV32BE a 64-bit scalar returned
+in a0/a1 places bits [31:0] (the least-significant XLEN bits) in a0 and
+bits [63:32] in a1; memory layout remains big-endian.
+
+NOTE: Defining the register-pair ordering independent of endianness allows
+RV32_Zdinx and Zilsd paired load/store paths to be used directly for argument
+passing and return without extra swaps. Memory layout remains governed by the
+target endianness.
+
Scalars wider than 2×XLEN bits are passed by reference and are replaced in the
argument list with the address.
@@ -198,7 +208,10 @@ Aggregates whose total size is no more than XLEN bits are passed in
a register, with the fields laid out as though they were passed in memory. If
no register is available, the aggregate is passed on the stack.
Aggregates whose total size is no more than 2×XLEN bits are passed in a pair
-of registers; if only one register is available, the first XLEN bits are passed
+of registers with the fields laid out as though they were passed in memory:
+the lower-numbered register holds the lower-addressed XLEN-sized chunk of
+the aggregate and the higher-numbered register holds the next chunk;
+if only one register is available, the first XLEN bits are passed
in a register and the remaining bits are passed on the stack. If no registers are
available, the aggregate is passed on the stack. Bits unused due to
padding, and bits past the end of an aggregate whose size in bits is not
@@ -231,7 +244,10 @@ same manner as named arguments, with one exception. Variadic arguments with
even-numbered), or on the stack by value if none is available. After a
variadic argument has been passed on the stack, all future arguments will also
be passed on the stack (i.e. the last argument register may be left unused due
-to the aligned register pair rule).
+to the aligned register pair rule). For 2×XLEN scalars placed in an aligned
+register pair, the lower-numbered register holds the least-significant XLEN bits
+and the higher-numbered register holds the most-significant XLEN bits,
+regardless of endianness.
Values are returned in the same manner as a first named argument of the same
type would be passed. If such an argument would have been passed by|
@kito-cheng Thanks, I agree!
We will check GCC implementation, and fix it there. |
97ccbce to
ca43634
Compare
|
@aswaterman could you take a look on the big-endian calling convention part :) |
Okay. For this basic test case: $ cat test.c
long long test()
{
return 0x1;
}GCC for LE generates: $ riscv64-unknown-linux-gnu-gcc -c test.c -O2 -march=rv32gc -mabi=ilp32
$ riscv64-unknown-linux-gnu-objdump -d test.o
test.o: file format elf32-littleriscv
Disassembly of section .text:
00000000 <test>:
0: 4505 li a0,1
2: 4581 li a1,0
4: 8082 retAnd for BE, it generates: $ riscv64-unknown-linux-gnu-gcc -c test.c -O2 -march=rv32gc -mabi=ilp32 -mbig-endian
$ riscv64-unknown-linux-gnu-objdump -d test.o
test.o: file format elf32-bigriscv
Disassembly of section .text:
00000000 <test>:
0: 4585 li a1,1
2: 4501 li a0,0
4: 8082 retSo, basically it does not follow the proposal here. We managed to come up with a fix, but needs some extra testing (djtodoro/gcc@71a0f9f), but with that applied, we now have: $ riscv64-unknown-linux-gnu-gcc -c test.c -O2 -march=rv32gc -mabi=ilp32 -mbig-endian
$ riscv64-unknown-linux-gnu-objdump -d test.o
test.o: file format elf32-bigriscv
Disassembly of section .text:
00000000 <test>:
0: 4581 li a1,0
2: 4505 li a0,1
4: 8082 ret |
|
I haven't had time to think this through yet, but make sure whatever you propose does the right thing for variadic functions. In particular, you want the argument-register layout to match the memory layout of arguments passed on the stack. This might encourage you to stick with GCC's current implementation, rather than making the change that @djtodoro mentioned. |
|
@aswaterman @kito-cheng Thanks for your comments! I checked variadic functions and found that the current psABI proposal text needs adjustment to match the actual GCC implementation after our fix (djtodoro/gcc@71a0f9f). Here is a small example: $ cat variadic.c
#include <stdarg.h>
volatile unsigned int SN[2];
volatile unsigned int SV[2];
volatile unsigned int SR[2];
__attribute__((noinline))
void consume_named(unsigned long long x) {
SN[0] = (unsigned)x;
SN[1] = (unsigned)(x >> 32);
}
__attribute__((noinline))
void consume_var(const char *tag, ...) {
va_list ap; va_start(ap, tag);
unsigned long long x = va_arg(ap, unsigned long long);
SV[0] = (unsigned)x;
SV[1] = (unsigned)(x >> 32);
va_end(ap);
}
__attribute__((noinline))
unsigned long long ret64(void) {
return 0x1122334455667788ULL;
}
int main(void) {
consume_named(0x1122334455667788ULL);
consume_var("p", 0x1122334455667788ULL);
unsigned long long r = ret64();
SR[0] = (unsigned)r;
SR[1] = (unsigned)(r >> 32);
return 0;
}Compile it as (asm files in attachment): # this does not include our proposed fix: https://github.com/djtodoro/gcc/commit/71a0f9fc4bf9ff1b92ac434e362261ed16ff396b
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 -mbig-endian variadic.c -o big_withoutfix_variadic.s
# with the fix
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 -mbig-endian variadic.c -o big_afterfix_variadic.s
# LE
$ riscv64-unknown-linux-gnu-gcc -S -O2 -march=rv32gc -mabi=ilp32 variadic.c -o le_variadic.sSo, In BE (both before and after our GCC fix): So the issue is: The current psABI proposal states that variadic 2×XLEN scalars should use "the lower-numbered register holds the least-significant XLEN bits... regardless of endianness." For the psABI, I propose we clarify the distinction:
So, the proposal could be: Please let me know your thoughts about this. big_afterfix_variadic.s.txt |
|
That sounds plausibly correct to me, but @kito-cheng should sanity-check it. |
|
Also, make sure to run through the GCC test suite with this scheme. Your simple test appears to catch the interesting case, but the test suite covers much more ground. |
|
@aswaterman Thanks!
Of course, I agree :) |
ping @kito-cheng :) any thoughts on this? :) |
GNU GCC Toolchain already supports big endian for RISC-V target. That support was merged without a change in psABI Document.
Here [0] is the initial PR for adding big endian support in LLVM project, so lets implement documentation part as well.
[0] llvm/llvm-project#146534